A Non-Training Approach to Generating High Semantic Titles for Chinese Documents

نویسندگان

  • Gai-Tai Huang
  • Hsiu-Hsen Yao
چکیده

Due to the abundance of available data, manual title generation may become unfeasible. Traditionally, information retrieval has been applied to perform automatic title generation by exploring and searching for keywords from a document. However, titles generated through such a direct combination approach may not satisfy Chinese grammatical rules, and also may not express the semantic meaning explicitly. Thus, we propose a non-learning approach based on conceptual schema to generating titles automatically through sentence modification and recombination. Experimental results prove that our model can satisfy the automatic title generation requirements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach

The purpose of automatic title generation is to understand a document and to summarize it with only several but readable words or phrases. It is important for browsing and retrieving spoken documents, which may be automatically transcribed, but it will be much more helpful if given the titles indicating the content subjects of the documents. For title generation for Chinese language, additional...

متن کامل

Hierarchical topic organization and visual presentation of spoken documents using probabilistic latent semantic analysis (PLSA) for efficient retrieval/browsing applications

The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual...

متن کامل

A Multi-layered Summarization System for Multi-media Archives by Understanding and Structuring of Chinese Spoken Documents

The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least some of the above issues: (1) Autom...

متن کامل

Title generation for spoken broadcast news using a training corpus

The problem of title generation involves finding the essence of a document and expressing it in only a few words. The results of a query to the Informedia Digital Video Library are summarized through an automatically generated title for each retrieved news story. When the document is errorful, as with speech-recognized broadcast news stories, the title creation challenge becomes even greater. W...

متن کامل

Generating an Indoor space routing graph using semantic-geometric method

The development of indoor Location-Based Services faces various challenges that one of which is the method of generating indoor routing graph. Due to the weaknesses of purely geometric methods for generating indoor routing graphs, a semantic-geometric method is proposed to cover the existing gaps in combining the semantic and geometric methods in this study. The proposed method uses the CityGML...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Inf. Sci. Eng.

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2005